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Abstract 

This paper presents greedy gossip with eavesdropping (GGE), a novel randomized gossip algorithm 
for distributed computation of the average consensus problem. In gossip algorithms, nodes in the network 
randomly communicate with their neighbors and exchange information iteratively. The algorithms are 
simple and decentralized, making them attractive for wireless network applications. In general, gossip 
algorithms are robust to unreliable wireless conditions and time varying network topologies. In this paper 
we introduce GGE and demonstrate that greedy updates lead to rapid convergence. We do not require 
nodes to have any location information. Instead, greedy updates are made possible by exploiting the 
broadcast nature of wireless communications. During the operation of GGE, when a node decides to 
gossip, instead of choosing one of its neighbors at random, it makes a greedy selection, choosing the 
node which has the value most different from its own. In order to make this selection, nodes need to 
know their neighbors' values. Therefore, we assume that all transmissions are wireless broadcasts and 
nodes keep track of their neighbors' values by eavesdropping on their communications. We show that 
the convergence of GGE is guaranteed for connected network topologies. We also study the rates of 
convergence and illustrate, through theoretical bounds and numerical simulations, that GGE consistently 
outperforms randomized gossip and performs comparably to geographic gossip on moderate-sized random 
geometric graph topologies. 

Index Terms 

Distributed signal processing. Gossip algorithms. Average consensus. Applications of sensor networks 
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I. Introduction AND Background [] 

Distributed consensus is recognized as a fundamental problem of distributed control and signal pro- 
cessing applications (see, e.g., [4]-[10] and references therein). The prototypical example of a consensus 
problem is computation of the average consensus: for a network of n nodes, initially each node has a 
scalar data value, yi, and the goal is to find a distributed algorithm that asymptotically computes the 
average, y = ■^Yl'i=i Vi every node i. Such an algorithm can further be used for computing linear 
functions of the data and can be generalized for averaging vectorial data. 

One of the algorithms proposed for solving the average consensus problem is distributed averaging 
[11]. In distributed averaging, every node in the network broadcasts its information at each iteration so 
that neighboring nodes can receive and use this information for their updates. However, with this scheme 
the speed of information diffusion across the network is slow for topologies used to model wireless mesh 
and sensor networks. The information at each node typically does not change much from iteration to 
iteration. Hence, the broadcast medium is not efficiently used. Recently, gossip algorithms have gained 
attention for the computation of average consensus [10], [12]. In contrast to distributed averaging, gossip 
algorithms allow only two neighboring nodes to communicate and exchange information at each iteration. 
Restricting all information exchange to be local in this fashion is attractive from the point of view of 
simplicity and robustness (e.g., to changing topology and unreliable network conditions). 

In this paper we propose a new randomized gossip algorithm, greedy gossip with eavesdropping 
(GGE), for average consensus computation. Unlike previous randomized gossip algorithms, which perform 
updates completely at random, GGE takes advantage of the broadcast nature of wireless communications 
and implements a greedy neighbor selection procedure. We assume a broadcast transmission model such 
that all neighbors within range of a transmitting node receive the message. Thereby, in addition to keeping 
track of its own value, each node tracks its neighbors values by eavesdropping on their transmissions. At 
each iteration, the activated node uses this information to greedily choose the neighbor with which it will 
gossip, selecting the neighbor whose value is most different from its own. Accelerating convergence in 
this myopic way does not bias computation and does not rely on geographic location information, which 
may change in networks of mobile nodes. 

Although GGE is a powerful yet simple variation on gossip-style algorithms, analyzing its convergence 
behavior is non-trivial. The main reason is that each GGE update depends explicitly on the values at 
each node (via the greedy decision of with which neighbor to gossip). Thus, the standard approach 

'Portions of this work were presented in [1], [2], [3]. 
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to proving convergence to the average consensus solution (i.e., expressing updates in terms of a linear 
recursion and then imposing properties on this recursion) cannot be applied to guarantee convergence of 
GGE. To prove convergence, we demonstrate that GGE updates correspond to iterations of a distributed 
randomized incremental subgradient optimization algorithm. Similarly, analysis of the convergence rate of 
GGE requires a different approach than the standard approach of examining the mixing time of a related 
Markov chain. We develop a bound relating the rate of convergence of GGE to the rate of standard 
randomized gossip. The bound indicates that GGE always converges faster than randomized gossip, a 
finding supported by simulation results. We also provide a worst-case bound on the rate of convergence 
of GGE. For other gossip algorithms the rate of convergence is generally characterized as a function of 
the second largest eigenvalue of a related stochastic matrix. In the case of GGE, our worst-case bound 
characterizes the rate of convergence in terms of a constant that is completely determined by the network 
topology. We investigate the behavior of this constant empirically for random geometric graph topologies 
and derive lower bounds that provide some characterization of its scaling properties. 

A. Background and Related Work 

Randomized gossip was proposed in [12] as a decentraHzed asynchronous scheme for solving the 
average consensus problem. At the kth iteration of randomized gossip, a node is chosen uniformly 
at random. It chooses a neighbor, tk, randomly, and this pair of nodes "gossips": and tk exchange 
values and perform the update Xs^{k) = xt^{k) = {xs^{k — 1) + xt^{k — l))/2, and all other nodes 
remain unchanged. One can show that under very mild conditions on the way a random neighbor, t^, is 
drawn, the values Xi{k) converge to y at every node i d& k ^ oo [11]. Because of the broadcast nature 
of wireless transmission, other neighbors overhear the messages exchanged between the active pair of 
nodes, but they do not make use of this information in existing randomized gossip algorithms. 

The convergence rate of randomized gossip is characterized by relating the algorithm to a Markov 
chain [12]. The mixing time of this Markov chain is closely related to the averaging time of the gossip 
algorithm, and therefore defines the rate of convergence. For certain types of graph topologies, the mixing 
times are small and convergence of the gossip algorithm is fast. For example, in the case of a complete 
graph, the algorithm requires 0(n) iterations to converge. However topologies such as random geometric 
graphs [13] or grids are more realistic for wireless applications. Boyd et al. [12] prove that for random 
geometric graphs, randomized gossip requires 0(n^/ log n) transmissions to approximate the average 
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consensus wel0 

Motivated by the slow convergence of randomized gossip, Dimakis et al. introduced geographic gossip 
in [10]. Geographic gossip enables information exchange over multiple hops with the assumption that 
nodes have the knowledge of their geographic locations and the locations of their neighbors. It has been 
shown that long-range information exchange improves the rate of convergence to ©(ny^n/logn) for 
random geometric graphs. However, geographic gossip involves overhead due to localization and geo- 
graphic routing. Furthermore, the network needs to provide reliable two-way transmission over many hops. 
Otherwise, messages which are lost in transit will result in biasing the average consensus computation. 

Recently, other fast gossiping algorithms have also been proposed. Most related is the work of Li 
and Dai [14], and Jung et al. [15]. Both approaches are based on directing the exchange of information 
across the network by constructing lifted Markov chains using knowledge of the geographic locations of 
nodes. As an extension to geographic gossip, Benezit et al. [16] have recently proposed averaging along 
paths, an algorithm which converges in 0(n) transmissions. All of these approaches rely on geographic 
information and thus are not suitable to scenarios where nodes are mobile or location information is 
not available. The focus of our work is on providing fast and communication-efficient computation that 
exploits broadcast transmissions rather than geo-location information to gossip quickly. 

Aysal et al. have proposed broadcast gossip, a consensus algorithm that also makes use of the broadcast 
nature of wireless networks [17], [18]. At each iteration, a node is activated uniformly at random to 
broadcast its value. All nodes within transmission range of the broadcasting node calculate a weighted 
average of their own value and the broadcasted value, and they update their local value with this weighted 
average. Broadcast gossip does not preserve the network average at each iteration. It achieves a low 
variance (i.e., rapid convergence), but introduces bias: the value to which broadcast gossip converges can 
be significantly different from the true average. 

Sundhar Ram et al. introduced a general class of incremental subgradient algorithms for distributed 
optimization in [19]. In this study, the effects of stochastic errors (e.g., due to quantization) on the 
convergence of consensus-like distributed optimization algorithms are investigated. Convergence of their 
algorithm is guaranteed under certain conditions on the errors, but the convergence rates are not charac- 
terized. 

^Throughout this paper, when we refer to randomized gossip we specifically mean the natural random walk version of the 
algorithm, where the node tk is chosen uniformly from the set of neighbors of Sfe at each iteration. For random geometric graph 
topologies, which are of most interest to us, Boyd et al. [12] prove that the performance of the natural random walk algorithm 
scales order-wise identically to that of the optimal choice of transition probabilities, so there is no loss of generality. 
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Nedic and Ozdaglar have also developed a distributed form of incremental subgradient optimization 
that generalizes the consensus framework [20]. Our problem formulation is not as general as theirs, but 
with the specific formulation addressed in this paper we achieve stronger results. In particular, our cost 
function has a specific form and, by exploiting it, we are able to guarantee convergence to an optimal 
solution and obtain tight bounds on the rate of convergence as a function of the network topology. 



B. Paper Organization 

The remainder of this paper is organized as follows. Section |ll] introduces the formal definition of 



the algorithm. Section III presents two bounds; the first relates the performance of GGE to randomized 
gossip and indicates that GGE always outperforms randomized gossip, and the second is a worst-case 
upper bound on the rate of convergence of GGE in terms of a topology-dependent constant. Results 



from numerical simulations are presented in Section IV Motivated by these results we provide a multi- 
hop extension to our algorithm in Section [V] Section VI summarizes the contributions of the paper and 
discusses future work. 



II. Greedy Gossip with Eavesdropping (GGE) 

We consider a network of n nodes and represent network connectivity as a graph, G = {V,E), with 
vertices V = {1, . . . ,n}, and edge ?,e.t E (Z V x V such that G ii^ if and only if nodes i and j 

directly communicate. We assume that communication relationships are symmetric and that the graph is 
connected. Let A/i = {j : € E} denote the set of neighbors of node i (not including i itself). 

Each node in the network has an initial value yi, and the goal of the gossip algorithm is to use only 
local broadcast exchanges to arrive at a state where every node knows the average y = ^ Y17=i Vi- 
initialize the algorithm, each node sets its gossip value to Xi{0) = yi. 

At the fcth iteration of GGE, a node is chosen uniformly at random from {1, . . . , n}. (This can be 
accomplished using the asynchronous time model described in [21], where each node "ticks" according 
to a Poisson clock with rate 1.) Then, Sk identifies a neighboring node tk satisfying 

tk e argmax|^(xs,(A; - 1) - xt{k - , (1) 

which is to say, identifies a neighbor that currently has the most different value from its own. This 
choice is possible because each node i maintains not only its own local variable, Xi{k — 1), but also a 
copy of the current values at its neighbors, Xj{k — 1), for j G Mi. When has multiple neighbors whose 
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values are all equally (and maximally) different from s^'s, it chooses one of these neighbors at random. 
Then and tf^ perform the update 



while all other nodes i ^ {sk, tk} hold their values at Xi{k) = Xi{k — 1). Finally, the two nodes, Sk and 
tk, broadcast these new values so that their neighbors have up-to-date information. 

Note that one GGE iteration could be accomplished with just two transmissions since already knows 
the values at its neighbors: one broadcast from Sk to tk notifying it of the change and simultaneously 
announcing the new value to all of s^'s neighbors, and one broadcast from tk to its neighbors to echo the 
new value to them. However, in networks with unreliable transmission or systems where nodes periodically 
shut off their radios to conserve energy, Sk may miss some transmissions from its neighbors, and thus 
may not always have accurate information about their values. In this case, mistakes in calculation at Sk in 
the two-transmission scheme just described would introduce errors, biassing the consensus computation. 
To make our algorithm more robust and address the case where does not precisely know the values at 
all neighbors, we assume a three-transmission version of our scheme throughout the rest of this paper: 
one transmission from Sk to tk to initiate gossiping, one from tk to its neighbors to inform them of the 
new value, and one from Sk to its neighbors to inform them of its new value. We comment further on 
this issue in Section IIV-CI 

A. Initialization 

Calculating the greedy update in ([T]) requires nodes to know their neighbors' values. Similar to other 
randomized gossip algorithms, we assume that at the outset of gossip computation each node i has 
already discovered its neighbor set. Mi, but it does not know its neighbors' values. Instead, these values 
are learned during an initialization phase. During the initialization phase, the node Sk that is activated at 
iteration k chooses tk randomly from the subset of its neighbors whose values it does not know, rather 
than performing a GGE update. Since Sk and tk broadcast their new values after averaging, the nodes 
in their neighborhoods overhear and acquire information accordingly. Once Sk has heard from all of its 
neighbors, the initialization process is complete for that particular node and it chooses tk greedily via 
([T]) for all subsequent iterations. 

Figure [T] illustrates the effects of this initialization scheme relative to an idealized initialization scheme 
and a naive initialization scheme. In the idealized scheme, each node clairvoyantly knows all of its 
neighbors' initial values, and all nodes immediately commence with greedy updates. In the naive scheme. 




(2) 
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before any node begins gossiping, all nodes broadcast once (without performing an update) to inform 
their neighbors of their starting value. The results shown correspond to a network of 200 node^ Observe 
that the proposed scheme and the naive scheme attain similar performance, and both involve an overhead 
of roughly n = 200 transmissions, which is not substantial relative to the total number of transmission 
required. We prefer the proposed initialization scheme to the naive broadcast approach because it does 
not require any scheduling mechanism. 



n=200, RGG topology 




1000 2000 3000 4000 5000 

Number of transmissions 



Fig. 1. A comparison of GGE performance for tliree different initializations of tlie algorithm. GGE with broadcast corresponds 
to the case where each node broadcasts its value once (without performing any updates) to initialize the values at all neighbors, 
before beginning to gossip. GGE with initialization corresponds to the proposed scheme, where, before a node has heard from all 
its neighbors, it gossips randomly with one it has not heard from, and after it has heard from all neighbors its performs greedy 
updates. Thus, the proposed GGE initialization scheme gives priority to learning neighbors' values during the initialization phase. 
GGE ideal assumes that each node clairvoyantly knows its neighbors' values at the outset. Note that GGE with broadcast and 
GGE with initialization give similar asymptotic performance (accruing overhead of roughly n = 200 transmissions), although 
the proposed scheme does not require any scheduling mechanism and during early iterations the proposed scheme actually uses 
each transmission to perform an update. 



The topology is generated randomly from the family of random geometric graphs on 200 nodes with the standard connectivity 
radius. See Section [IV] for further details on simulations, and note that the setup used here is the same as that used to generate 



Figure 2(a) Although not reported here due to space constraints, we observe similar behavior on other network topologies. 



September 9, 2009 



DRAFT 



8 



III. Convergence Analysis 

A. Convergence of GGE 

To derive convergence results, we interpret GGE as a randomized incremental subgradient method [22]. 
Consider the constrained optimization problem, 

n 

min > f'i(x) 
1=1 

subject to X G X, 

where we assume that each fi{x) is a convex function, not necessarily differentiable, and X is a non- 
empty convex subset of R". An incremental subgradient algorithm for solving this optimization problem 
is an iterative algorithm of the form: 

x{k) = Vx[x{k - 1) - akg{sk,x{k - 1))], (3) 

where > is the step-size, g{sk, x{k — 1)) is a subgradienj^ of fs^ at x{k — 1), and Vx[-] projects its 
argument onto the set X. The algorithm is randomized when the component updated at each iteration, 
Sk, is drawn uniformly at random from the set {1, . . . , n}, and is independent of x{k — 1). Intuitively, 
the algorithm resembles gradient descent, except that instead of taking a descent step in the direction of 
the gradient of the cost function, /(x) = X^ILi fi{^)^ e.a.c\\ iteration we focus on a single component, 
fi{x). The projection, "Pxi'], ensures that each new iterate x{k) is feasible. Under mild conditions on 
the sequence of step sizes, a^, and on the regularity of each component function fi{x), Nedic and 
Bertsekas have shown that the randomized incremental subgradient method described above converges 
to a neighborhood of the global minimizer [22]. 

GGE is a randomized incremental subgradient algorithm for the problem 



min 




n n 

subject to 

1=1 i=l 

''Subgradients generalize the notion of a gradient for non-smootli functions. The subgradient of a convex function }i at x is 
any vector g that satisfies fi{y) > fi{x) +g^{y — x). The set of subgradients of fi at x is referred to as the subdijferential and 
is denoted by dfi{x). If fi is continuous at x, then dfi{x) = {\7fi{x)}\ i.e., the only subgradient of fi at x is the gradient. 
A sufficient and necessary condition for x* to be a minimizer of the convex function fi is that G dfi(x*). See [22] and 
references therein. 
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where y-i is the initial value at node i. The objective function in (|4]) has a minimum value of which is 
attained when Xi = Xj for all Thus, any minimizer is a consensus solution. Moreover, the constraint 
Y17=i ~ Sr=i Vi ensures that the unique global minimizer is the average consensus. 

To connect the GGE update, Q, and the incremental subgradient update, Q, observe that g{k), the 
subgradient of fs^{x{k — 1)) = maxj^x^^{^{xs^^{k — 1) — Xj{k — 1))^}, is defined by 

Xs^{k - 1) - xt^{k - 1) for i = Sfe, 
9i{k) = { -{xsSk - 1) - xt,{k - 1)) for i = tk, (6) 

otherwise. 

Here the subscript i indexes the components of the vector g. Fixing a constant step size ak = ^, the 
update Q is identical to ([2]). The recursive update for GGE thus has the form 

x{k)=xik-l)-^g{k), (7) 

Note that the projection is unnecessary since the choice ak = \ ensures that the constraint Yll^=i Xi{k) = 
Sr=i yi satisfied at each iteration. 

Nedic and Bertsekas study the convergence behavior of randomized incremental subgradient algorithms 
in [22]. For a constant step size, their analysis only guarantees the iterates x{k) will reach a neighborhood 
of the optimal solution: with probability 1, mink f{x{k)) < anC'^/2, where C > \\g{sk, x{k))\\ is an 
upper bound on the norm of the subgradient [22]. We wish to show that x{k) actually converges to the 
average consensus, x, the global minimizer of our optimization problem, and not just to a neighborhood 
of X. By exploiting the specific form of the GGE cost function, we are able to prove the following 
stronger result. 

Theorem 1: Let x{k) denote the sequence of iterates produced by GGE. Then x{k) —>■ x almost surely. 
Proof: To begin, we examine the improvement in squared error after one GGE iteration. Expanding 
x{k + 1) via the expression ([7]), we have 

||x(/c) — x|p = \\x{k — 1) — -g{k) — x\\'^ 

= w^^k - 1) - xf - {x{k - 1) - X, g{k)) + \\\g{k)f. 

Based on the definition of g{k) in Q, given and t^, we have 

^ =2{xsdk-l)-XtAk-l)f, 
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and, 

n 

{x{k - I) - X, g{k)) = ^{xi{k - I) - Xi) gi{k) 

1=1 

= {Xs,{k-1)-Xt,{k-I)f. 

Therefore, we have 

\\x{k) - xf = \\x{k - 1) - xf - ^\\g{k)f (8) 

with probabiUty 1, since the expression holds independent of the value of Sk and tk- Recursively applying 
our update expression, we find that w.p. 1, 

\\x{k)-xf = \\x{k - 1) - xf - ^\\g{k)f 

k 

= \\x{k-2)--xr-- ^ uj)f 

j=k-i 

k 

= ||x(0)-xf 

Since ||a;(A;) — x\\'^ > 0, we must have 

k 

Y,MJ)f<^x{o)-xf 

i=i 

w.p. 1, and, consequently, the series ^2^]=! WdU) IP converges a.s. as A; ^ cxd. Since each term ||5(i) |p > 0, 
this also implies that —>■ a.s. as A; ^ cxd. However, by definition, g{k) is the subgradient of 

a convex function, and g{k) = is both a sufficient and necessary condition for x{k) to be a global 
minimizer. Thus, g{k) a.s. imphes that x{k) — > x a.s., since x is the unique minimizer of (|4])-([5]). 

■ 

B. Convergence Rate: GGE vs. Randomized Gossip 

The following theorem establishes a general expression for the bound on the mean-squared error of 
GGE after k iterations. Moreover, it demonstrates that the upper bound on the MSE of GGE is less than 
or equal to the upper bound on the MSE of randomized gossip. 

The GGE updates can also be expressed in the form x{k) = W*-^^^ {k)x{k — 1) where W'^'~^^{k) 
is a stochastic matrix with Wff^f (/fc) = W^if^f (/c) = W^^,f{k) = Wt^Jf{k) = W[f^{k) = 1 
for all i ^ {sk,tk}, and elsewhere. We denote the appUcation of k successive GGE updates by 
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y^GGE(^-^ . ^ \[j=iW'-"~'^{j). Likewise, let W^'~'{1 : k) = 11^=1 denote the successive 

application of k randomized gossip updates. Let W = E[1^^'^(A;)] denote the expected value of the 
randomized gossip matrix, and let \2{W) denote the second largest eigenvalue of W. 

Theorem 2: Let the algorithm input, x(0), be given, let x denote the corresponding average consensus 
vector. After k iterations, the expected mean squared error of GGE is upper bounded as follows: 

k 

E : A:)x(0) - xf] < \\x{0) - xf J] (A2 (W) - ^i) (9) 

1=1 

where = if E [||VF<^'^^(1 : i - l)x{0) - ] = 0, and otherwise, 



E 



£ (max {x,{i - 1) - xt{i - - E ( ^ E {xs{i - 1) - xt{i - I))' 

s=i \t(^^'s / s=i V teA/". 



>0. (10) 



Note that x{i — 1) is a random quantity determined by the choice of [si S2 ••• and the expectation 



in (10 1 is over these variables. 



Remark 1: The analogous expression for randomized gossip is simply [12] 

E : k)x{0) - xf ] < ||x(0) - xf AaW^ 

(Note that here, the expectation is taken with respect to both random nodes chosen at each iteration, 
[si S2 ... Sfc] and [ti t2 ... tfc], whereas in the expressions in the theorem, the only randomness is in 
[si S2 ■■■ Sfc].) Since > for alH = 1, . . . , fc, this implies that the upper bound on GGE is uniformly 
upper bounded by the upper bound for randomized gossip, for any A; > and any input x(0). The 
upper bound for random gossip is tight; if V2 denotes the eigenvector corresponding to the second-largest 
eigenvalue of W, then if x(0) = CV2 for some constant c, the upper bound holds with equality (in 
expectation). 

Remark 2: The form of the terms also provides insight into which scenarios are less favorable for 
GGE. In general, we know that randomized gossip is slow to converge on random geometric graphs 
[12], and so we hope that > so that GGE achieves some improvement. Note that the numerator 
of measures how much larger (on average) a GGE step from x(i — 1) is in comparison to the step 
taken by randomized gossip from the same location. There are two scenarios where the expression for 

in ( fTO) ) evaluates to 0. The first is when x(i — 1) = x, in which case consensus has already been 
achieved. The second is when the difference between any two neighbors is constant across the network; 
i.e., {xs — xt)"^ = c for all t E Ms and all s = 1, . . . , n. In this setting, being greedy does not provide 
any gain, since gossiping with any neighbor provides the same amount of immediate improvement. 
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Proof: [Proof of Theorem |2| We recall the known convergence rate bounds for randomized gos- 
sip [12], 

E : A;)x(0) - x\\^] < A2 {wf ||a;(0) - xf (11) 

and the related recursive relationship, 

E : A:)a;(0) 

= E 



11 

: k - l)x(O) - _ _ ^ ^ - 1) - x,{k - l)f 

s=l * 



< A2 (1^) E : k - l)x(O) - . 



(12) 



We can identify an equivalent relationship derived from applying k — 1 steps of GGE followed by one 
step of random gossip: 

E [\\W^^{k)W^^^{l : k - l)a;(0) - x\\'^] 



E 



\W 



GGE 



1 1 

{i:k- i)x(o) -^\?-i^Y.YKr\T. (^^(^ - 1) - - 1))' 



< A2 [W) E [||Ty^^^(l : k - l)x(O) 



(13) 



With this relationship in hand, we can bound the eiTor of the GGE algorithm by adding and subtracting 
the effects of making the fc-th step a randomized gossip update: 

E : k)x{jd)-x)f] 

1 " 1 

E ||T^^«^(1 : k - l)x(O) - - - J] — ^ {xs[k - 1) - xt{k - I))' 



■E 



i- f^max [x.ik - 1) - x,{k - \)f + ^ E ^ E (^^(^ - 1) - ^*(^ - 1))' 



2n ^ t&N. 

s=l 



< [A2 {W) - ik] E : k - l)x(O) - . 

Repeated application of this inequality from i = 1, . . . , yields the bound (|9]). 



(14) 



C. GGE Convergence Rate: Worst Case Bounds 

The previous subsection related the performance of GGE to that of standard randomized gossip. Next, 
we seek a more direct characterization of the GGE rate of convergence in terms of properties of the 
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underlying communication topology. We then revisit our comparison to randomized gossip. The rate of 
convergence for gossip algorithms is typically quantified in terms of the e-averaging time, 

TaU^= sup infjfc : Pr (M^l^ > e) < e 

Other gossip algorithms such as randomized gossip and geographic gossip are easily related to a homo- 
geneous Markov chain, and Tave{^) can be shown to scale as a function of the second largest eigenvalue 
of W. In particular (see Theorem 3 in [12]), Tave{() < \(^\^(yv)-^ ' randomized gossip, the matrix 
W depends on the choice of probabilities assigned to each edge in the network and hence depends on 
the network topology. 

Since, in each iteration of GGE, the greedy decision depends on the gossip values at each node, x{k), 
our algorithm cannot be related to a homogeneous Markov chain (t^ depends on x{k)). Consequently, 
the same machinery cannot be used to characterize the rate of convergence for GGE. The goal of this 
section is to bound the rate of convergence of GGE through alternative means. To this end, our main 
result is the following. 

Theorem 3: Let G = {V, E) denote the graph on which we are gossiping, let x{k) denote the vector 
of GGE values after k iterations, and let x denote the average vector. Then 

W.[\\x{k)-xf] < A{Gf\\x{Q) - xf , 

where A{G) is the graph-dependent constant defined as 

xj^x n ^-^ V 4 a; — a; r / 

s=\ ^ " " ' 

where gsix) refers to a subgradient of fs{x), when viewing GGE as an incremental subgradient algo- 
rithmic Moreover, the e-averaging time for GGE is bounded above by 

- log A{G)-^- 

Remark 3: Note that the constant A{G) only depends on the topology of the graph. This constant 
plays a role for GGE similar to that played by the second-largest eigenvalue of W for regular gossip 
algorithms. 

Proof: [Proof of Theorem [3| The proof of the first part of Theorem [3] is based on an approach 
introduced in [23] and developed in [24] for analyzing data-adaptive algorithms. We begin by recalling 

^We explicitly note that this constant is a function of the underlying topology by writing A{G). The constant A{G) is 
completely determined by the neighborhood structure of the network because the maximization is over all x. For a fixed x, the 
subgradients are determined by the neighborhood structure. 
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the recursion for the mean squared error of GGE after k iterations expressed in ([8]l: 



x||^ = — 1) — x||^ — - 



\x{k — 1) — x\ 



4:\\x{k - 1) - 

where g{k) denotes the subgradient at iteration k (when viewing GGE as a randomized incremental 
subgradient algorithm), and is a random quantity, depending on which node s{k) is activated at iteration 
k. Let M{k) = \\x{k) — x|p denote the error after k iterations, and let N{k) = 1 — 4||3;(ffijL;g||2 denote 
the amount of contraction at iteration k. Using these definitions and some successive conditioning, we 
get 

E[M{k)] = E[N{k)M{k-l)] 

= E[E[N{k)M{k-l)\x{k-l)]] 
= E[M(A:-l)E[iV(/!:)|x(A;-l)]] 



= M{0)E[E[N{l)\x{0)] ■ ■■E[N{k)\x{k - 1)]]. 

Note that A{G) is defined in such a way that E [N{k)\x{k - 1)] < A{G) for all k. Therefore, it follows 
that 

E[\\x{k)-xf] < A{G)^\\x{0)-x\\'^. 

Next, we prove the second part of the claim: the bound on e-averaging time. To do this, we will use the 
bound we have just derived to develop an upper bound on Pr(||x(fc) — x|| > e||x(0) — x||), the probability 
that after k iterations we are still more than a factor of e away from the initial error. Applying Markov's 
inequality and the bound we just derived for E[||x(A;) — we have 

Pr(||x(/c) > e||x(0) -x||) = Pr (||x(A;) - xf > e2||x(0) - xf ) 

^ E[||x(fc) -xf] 
~ e2||x(0) — xP 

To get an upper bound on Tave{^), first note that Pr(||x(A;) — x|| > e||x(0) — x||) < e provided that 
k > log^^^G')-! • ^^^^^ "^he first part of our proposition, the bound on E[||x(A;) — xp] is based on a 
worst-case one-step analysis, it is an upper bound on the mean squared error at iteration k, effectively 
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a lower bound on the rate of convergence. Therefore, we have an upper bound on the e-averaging time 
for GGE; that is T„,e(e) < i^ffgy^- ■ 

Theorem [3] provides a direct Unk between the rate of convergence of GGE and the underlying network 
topology through the constant, A{G). This motivates further study of how A{G) behaves for different 
classes and sizes of network topologies. Next, we derive a lower-bound on A{G) as a function of the 
maximum degree of the network, dmax = maxj \Mi\. Then we apply this result to characterize A{G) for 
two-dimensional grid and random geometric graph topologies. 

Theorem 4: Let G be a graph with n nodes and maximum degree (imax- As above, let W denote the 
expected update for one step of randomized gossip on G. There exists a vector x e M" with corresponding 
average consensus vector x such that 



E\\W'^'^^x-x\ 
\\x — xP 

and this implies a lower-bound for A{G), 



- > (l-'imax(l-A2(VF))), (15) 



^(G) > l-tZmax(l-A2(Ty)). (16) 

The proof appears below. We can use this result to relate the upper bounds on averaging time for GGE 
and standard randomized gossip. Let U'^'^^{G, e) = iog'A{G)-^ denote the upper bound on the averaging 
time of GGE obtained in Theorem jij and let U^^{G, e) = denote the corresponding upper 

bound on the averaging time of randomized gossip [12]. Using two inequalities for the logarithm — 
namely, a < — log(l — a) for a G (0, 1), and log A < A — 1 for A > — we obtain 

^^^^(G, e) > 'f'f,^,, > ^^^(G, (17) 

In words, the upper bound on the averaging time of GGE is at most a factor of dmax better than the upper 
bound for randomized gossip. Of course, this only links the upper bounds of the two algorithms and does 
not directly relate their actual performance. However, simulation results presented in the next section 
indicate that this relationship indeed captures the improvements seen for GGE over randomized gossip. 
Moreover, recall that the bounds on the expected improvement after a single gossip iteration are tight 
for both GGE and randomized gossip. The bound for GGE in Theorem [3] becomes an equality for A; = 1 
when x(0) is taken to be the x that solves the optimization problem defining A{G). Similarly, the bound 
for randomized gossip becomes an equality when x{0) is taken to be the eigenvector corresponding to 
the second largest eigenvalue of W. 

We are interested in understanding the performance of GGE for applications primarily in wireless 
networks. Random geometric graphs, first introduced in [13], are commonly used to model connectivity 
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in wireless networks for the purpose of analyzing scaling behaviour of algorithms. A random geometric 
graph on n nodes is obtained by uniformly assigning each node i.i.d. coordinates in the unit square and 
then connecting nodes whose distance is less than connectivity radius r(n). In this paper we adopt the 



common scaling r(n) = y ^ " , which guarantees the network is connected with high probability [13]. 

For a random geometric graph with n nodes, it is known [12] that, for r(n) as given above, every node 
has 27rlog(n)(l — o(l)) neighbors with high probability. Thus, with dmax = 27rlog(n)(l — o(l)), we see 
that GGE gives essentially a factor of log n improvement in averaging time over randomized gossip. For 
a two-dimensional grid dmax = 4, and GGE gives only a constant improvement in averaging time. These 
results are illustrated via simulation in the next section. 

Proof: [Proof of Theorem |4| Our starting point is ( [TO] ) from Theorem |2] Since we are focusing on 
the effect of applying a single gossip iteration, we drop the time index i in (fTO]) to simplify the notation: 



E 



t (max (x. 



Xt] 



s=l 



E I ^ E {xs-xtY 



s=l 



>0, 



(18) 



2nE[\\x - xW^] 

Using the fact that the maximum of a set of non-negative values is always less than or equal to the sum 
of those values, we can write 



E 



max(j;= — xt)' 



< 



< 



E 



E 



^{Xs 



Xt) 



Xt 



(19) 
(20) 



Therefore we can upper bound ^ by 

(d^ax - 1) E 



E ( ^ E {xs-xtY 



s=l 



(21) 



2nE[||j;-xP] 

Next, take x to be the eigenvector corresponding to the second largest eigenvalue of W and equate 
: k - l)x(O) = X in ([12]) to get 



E 



1 " 1 

1 ' t&M, 



2n 



< 



-{1- X2{w))m\x-x\ 



(22) 



Applying this inequality in ( |2T] ) gives 

e< (dmax-l)(l- A2(W)). (23) 

Observe that the one-step bound obtained by taking A; = 1 in (|9]l is tight. In particular, for our choice 
of X as the eigenvector corresponding to the second largest eigenvalue of W, we have equality in Q: 



E \\\W 



GGE 



X — X 



'] = \\x-x\mx2 



(24) 
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Inserting ( [23] ) into (24i leads to the first claim in the proposition. Then, combining this bound with the 
first inequality in Theorem [5] for the case k = 1 and a;(0) = x yields the desired lower bound on A{G). 



IV. Numerical Simulations 

In this section we report the results of simulations conducted to compare the performance of GGE with 
randomized gossip [12] and geographic gossip [10] for a variety of initial conditions. We also compare 



the empirically achieved convergence rates to the bound established in Section III-C and investigate how 
this bound behaves as the number of nodes in the network grows. 



A. Comparison of Convergence Rates 

We first compare the convergence rates of GGE with randomized gossip and geographic gossip by 

I \x(k')—x\ I 

examining the reduction they achieve in relative error, , as a function of the number of transmis- 

sions (communication complexity). Since the number of transmissions per iteration is different for each 
algorithm, this is a fairer comparison than examining convergence rate relative to the number of iterations. 
Randomized gossip requires two wireless transmissions per iteration, GGE requires three transmissions 
(see the discussion in Section [ll]l, and geographic gossip has a variable number of transmissions per 
iteration, depending on the number of hops between the gossiping nodes. We simulate networks with 
random geometric graph topologies, and all figures show averages over 100 realizations of the random 
geometric graph. We examine performance for four different initial conditions, x(0), in order to explore 
the impact of the initial values on performance. The first two of these cases are a Gaussian bumps 
field, and a linearly-varying field. For these two cases, the initial value x(0) is determined by sampling 
these fields at the locations of the nodes. The remaining two initializations consist of the "spike" signal, 
constructed by setting the value of one random node to 1 and all other node values to 0, and a random 
initialization where each value is i.i.d. drawn from a Gaussian distribution AA(0, 1) of zero mean and 
unit variance. The first three of these signals were also used to examine the performance of geographic 
gossip in [10]. 

Figs. |2ja)-(d) show that GGE converges to the average at a much faster rate (both initially and 
asymptotically) than randomized gossip for all initial conditions. The initial rate of convergence of GGE 
is faster than geographic gossip for all but the linearly-varying field, and for the simulated network size 
(n = 200), the asymptotic rates of reduction in relative error are similar for the two algorithms. Out of 
these candidate initializations, the linearly-varying field is the worst case. This is not surprising since 
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n=200, RGG topology 
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(a) Gaussian bumps convergence rate comparison 
n=200, RGG topology 
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(c) Spike convergence rate comparison 
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(b) Linearly-varying field convergence rate comparison 
n=200, RGG topology 
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(d) Uniform random field convergence rate comparison 



Fig. 2. A comparison of the performance of randomized gossip, GGE, and geographic gossip for four initializations of a;(0). 
Results are averaged over 100 realizations of the random geometric graph and 100 runs of the algorithm per graph. 



the convergence analysis conducted in Section III suggests that constant differences between neighbors 
causes GGE to provide minimal gain. 

We also compare the performances of the three gossip algorithms for the grid topology. Figure [3] shows 
that in grid-structured networks the performance of GGE is close to the performance of randomized gossip 
(constant improvement). Clearly geographic gossip has the best performance in this case. As discussed 



in Section III-C the small number of neighbors in the grid topology restricts the improvement that GGE 
can achieve relative to randomized gossip. 

For random geometric graph topologies, the expected node degree, E[|A/'s|], scales as logre in contrast 
to constant average node degree in the case of grid topology. Therefore, GGE is able to provide better 
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n=196, Grid topology 
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Fig. 3. A comparison of the performance of randomized gossip, GGE, and geographic gossip for two initializations of a;(0) 
in grid topology. Results are averaged over 100 runs of the algorithm. 



performance results compared to randomized gossip for graph topologies where average node degree 
increases with the number of nodes. To improve GGE performance for topologies with low expected 
node degree values, we propose an extension to the algorithm. Details of this extension, which we call 
Multi-hop GGE, are provided in Section |Vj 

B. Comparison with the Theoretical Upper Bound 

We now compare the empirical average relative error for the random geometric graph with the bound 
developed in Theorem 3. There is no closed-form solution for A{G), so we solve the optimization 
problem identified in Theorem 3 numerically, using an incremental subgradient algorithm. Since the cost 
function can be expressed as a function of {x{k) — x)/\\x{k) — x\\, we can focus on maximizing over 
X satisfying x = and ||x(A;)|p = 1. In this simplified setting, one can reformulate the optimization as 
the minimization of a convex function over a non-convex set of constraints. We approximate the solution 
to this minimization using a projected incremental subgradient method. To avoid the problem of local 
minima (since the constraint set is non-convex) we rerun the optimization algorithm from multiple initial 
conditions. Figure |4] shows the relative error achieved by GGE as a function of the number of iterations 
for different initial conditions of x(0), averaged over 100 realizations of the algorithm. Also plotted is 
the bound identified in Theorem 3, substituting in A{G) calculated numerically. For all but the linearly- 
varying field, GGE achieves a much more rapid initial decrease in error than indicated by the bound. 
After approximately 1000 iterations, the bound provides a good indication of the rate of decrease in error. 
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We again observe that the Unearly-varying field is close to a worst-case scenario for GGE. 
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Fig. 4. A comparison of the theoretical bound on relative error and the experimental performance of GGE for four initializations. 
Results are for 100 realizations of the random geometric graph, averaged over 100 runs of the algorithm. 



Next, we examine how the communication complexity scales with respect to the number of nodes 



in the network. Figs. 5(a) and 5(b) display how A{G) and the averaging time scale as a function of 
the number of nodes n, for random geometric graph and grid topologies, respectively. To obtain the 
random geometric graph curve, we generate 50 random graphs for each value of n, and numerically 
evaluate A{G) for each using the procedure detailed above. The top panel shows how the values of 
A{G) change as the number of nodes increases. The bottom panel shows the e-averaging time, Tave{f-), 
evaluated via simulation, for e = 0.01 versus the number of nodes. Note that Figs. 5(a)| and 5(b)| show 
the averaging time in terms of the number of iterations per node. The errorbars depict the minimum, 
mean and maximum values obtained for the 50 simulated graphs for each n. For reference, the dotted 
lines depict 1.5n/logn for the random geometric graph and 2.5n for the grid topology. 



C. Stale Information 

In wireless networks, links can be unreliable, and for GGE, it is possible for nodes to miss some 
updates from their neighbors. Consequently, nodes will have stale information about their neighbors' 
values and therefore the greedy selection in GGE may be affected. Here we investigate the effect of stale 
information on the performance of GGE through a simulation study. 

We consider random graph topologies with 200 nodes. The initial measurements x(0) correspond 
to sampling the Gaussian bumps field, similar to Figure |2ja). As described in Section |llj at the fcth 
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Number of nodes (n) Number of nodes (n) 




Number of nodes (n) Number of nodes (n) 

(a) RGG topology (b) Grid topology 

Fig. 5. The scaling behavior of A{G) and the averaging time Taue(e) for e = 0.01 as a function of the number of nodes n 
in the network. The top panels show A{G) evaluated numerically, and the bottom panels compare Ta„e(0.01) computed via 
simulation and via the corresponding bound from Theorem [3] (a) 50 random geometric graphs are simulated for each value of 
n. The error bars depict the minimum, mean, and maximum values obtained over these 50 realizations. The curve 1.5n/logn 
is also shown for reference, (b) For the grid topology, error bars are not used since there is only one realization of the grid for 
a given network size. The curve 2.5n is also shown for reference. 



GGE iteration, two nodes s^. and t/^ perform averaging. To provide up-to-date information to their 
neighbors, Sk and tk broadcast their new values. We simulate the case when nodes randomly miss 
the broadcasted messages. We assume that the gossiping nodes Sk and tk communicate reliably, but 
sometimes eavesdropping nodes miss an update from their neighbor. Specifically, each eavesdropping 
node independently misses the transmission from its neighbor with probability p. Figure |6] illustrates the 
performance degradation in GGE. Curves are shown for four different values of p between and 0.5, and 
standard randomized gossip is also shown for comparison. We conclude that GGE provides significantly 
better performance than randomized gossip even when 50 percent of the broadcast messages are missed. 

V. Multi-hop Greedy gossip with eavesdropping 

As Figs. |2]and|3]indicate, the improvement of GGE over randomized gossip is less for the grid topology 
compared to the RGG topology. The decrease in improvement is due to the fact that the node degree in 
a two-dimensional grid is bounded at 4 and does not increase with the network size. Here we propose 
an extension to our algorithm that improves the performance of GGE such that it can be employed for 
topologies with low average node degree. Essentially, this extension allows nodes to perform greedy 
gossip updates with nodes beyond their immediate one-hop neighborhood. 



September 9, 2009 



DRAFT 



22 
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Fig. 6. A comparision of the performance of randomized gossip witli GGE in tire case of stale information due to independent 
link failures with probability p. Results are for 100 realizations of the 200-node random geometric graph, averaged over 100 
runs of the algorithm. 

In one-hop GGE, at the A;th iteration, Sk determines which neighbor, tk, has a value most different 
from its own. In two-hop GGE, instead of completing the update, checks if any of its neighbors 
Uk G AA^ has a value even more different from Sk than its own; i.e., — XuJ\ > H^s^ — xtj\. If so, 
nodes Sk and Uk gossip; otherwise Sk and tk gossip. Multi-hop gossip generalizes this idea to even larger 
neighborhoods. For example, can search its neighborhood, and so on. 

To observe the effect of performing greedy updates over multiple hops, we conduct an experimental 
comparison between 1-hop, 2-hop, and 3-hop GGE. As a point of comparison, we also include curves for 
randomized gossip and geographic gossip. Figure [7] illustrates the results for grid and random geometric 
graph topologies. In the grid, 3-hop GGE achieves an asymptotic rate of reduction in relative error that is 
comparable to geographic gossip. In the random geometric graph topology, the asymptotic performance 
of 1-hop GGE is already similar to that of geographic gossip (for a network of 200 nodes) and the multi- 
hop versions lead to significant improvements while still limiting all gossip exchanges to be between 
nodes separated by at most three hops. 

VI. Conclusion 

In this paper we propose a new average consensus algorithm for wireless sensor networks. Greedy 
gossip with eavesdropping (GGE) makes use of the broadcast nature of wireless communications and 
provides fast and reliable computation of average consensus. We provide (i) a proof that GGE converges 
to the average consensus; (ii) a bound on the mean-squared error after k iterations of GGE; (iii) a 
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(a) Linearly-varying field convergence rate comparison 
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Fig. 7. A comparison of the performance of randomized gossip, GGE (1-hop), multi hop GGE (2 and 3-hops) and geographic 
gossip for linearly-varying field initialization of a::(0) in grid topology and Gaussian bumps initialization of ^(O) in RGG 
topology. Results are averaged over 100 runs of the algorithm. 



bound on the e-averaging time of GGE; and (iv) theoretical bounds suggesting that GGE converges faster 
than randomized gossip, and (v) a characterization of the improvement in convergence rate achieved by 
GGE over randomized gossip as a function of the maximum degree. Simulation experiments compare 
the performance of GGE, randomized gossip [12], and geographic gossip [10] and demonstrate that the 
theoretical bound on mean-squared error provides a good characterization of the algorithm performance. 
The simulation experiments also investigate the scaling behavior of the communication complexity of 
GGE. 

GGE retains the robustness and simplicity of randomized gossip; it does not require nodes to acquire 
location information and it does not introduce the overhead of geographic routing. There is an additional 
memory overhead (nodes store their neighbors' values), but this storage requirement is small. Nodes 
do need to learn their neighbors' values, and we propose an initialization process that introduces a 
minor performance penalty with negligible added complexity. Since nodes eavesdrop on their neighbors' 
broadcasts, they must remain in "Receive" mode throughout the entire operation of the GGE algorithm. 
In randomized gossip, nodes can enter "Idle" mode and only switch to "Receive" mode when they 
detect that a neighbor is requesting a data exchange. In a wireless sensor network implementation, 
this difference could lead to concerns that GGE would consume more energy than randomized gossip. 
However, empirical studies have shown that energy consumption in "Idle" and "Receive" modes is very 
similar for most existing wireless sensor network architectures [25]. 
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Our future work will investigate the benefits of GGE in networks of mobile nodes. When nodes are 
mobile, other fast consensus approaches which exploit knowledge of geographic location are no longer 
applicable. However, because GGE is purely local and adaptive, we believe it is a promising candidate for 
accelerating gossip algorithms in time-varying networks. We also plan to investigate further coimections 
between consensus algorithms and incremental subgradient optimization algorithms, towards computing 
more general functions than the average. 
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